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The microscopic behavior of water under different conditions and in different environments remains the subject 
of intense debate. A great number of the controversies arises due to the contradictory predictions obtained 
within different theoretical models. Relative to conclusions derived from force fields or density functional 
theory, there is comparably less room to dispute highly-correlated electronic structure calculations. Unfor- 
tunately, such ab initio calculations are severely limited by system size. In this study, a detailed analysis of 
the two- and three-body water interactions evaluated at the CCSD(T) level is carried out to quantitatively 
assess the accuracy of several force fields, density functional theory, and ab initio-based interaction poten- 
tials that are commonly used in molecular simulations. Based on this analysis, a new model, HBB2-pol, is 
introduced which is capable of accurately mapping CCSD(T) results for water dimers and trimers into an 
efficient analytical function. The accuracy of HBB2-pol is further established through comparison with the 
experimentally determined second and third virial coefficients. 



I. INTRODUCTION 

Connecting small clusters of water and the condensed 
phases of water through a single molecular model has 
been a long sought-after but so-far unachieved goal. The 
challenges involved in the pursuit of this goal are nu- 
merous. For example, at the cluster level, the Born- 
Oppenheimer energies of topologically distinct isomers 
of the water hexamer differ by less than 1 kcal/moP^, 
indicating that highly-correlated electronic structure cal- 
culations are required to quantitatively determine the en- 
ergy order of these isomers. In this regard, a faithful 
description of molecular flexibility appears to be partic- 
ularly importantP Furthermore, it has also been shown 
that nuclear quantum-mechanical effects can impact the 
structural, thermodynamic, and dynamical properties of 
both clusters and bulk phases of waterPHS The explicit 
inclusion of nuclear quantum effects in simulations exac- 
erbates the computational expense of a model, provid- 
ing additional strain on the ability to obtain statistically 
meaningful results. 

The majority of water simulations rely on force fields, 
which are built upon the many-body expansion of the 
interaction energyP 

N N 

E(1,...,N) = J2V 1B ( 1 ) + J2 V2B ^J) 

i i<j 

N 

+ £ V 3B (i,j,k) + .-- + V NB (l,...,N). (1) 

i<j<k 

Here, V 1B (i) = E(i) - E eq (i) is the one-body (IB) po- 
tential, which describes the energy required to deform 
an individual molecule from its equilibrium geometry. 
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In common force-fields, the IB interactions include all 
bonded terms (i.e., stretches, bends, and torsions). For 
systems, such as water, that are easily reduced to dis- 
tinct molecules, a IB configuration is typically referred 
to as a "monomer", and groups of 2, 3, ... , N interact- 
ing monomers are then termed "dimers", "trimers", 
"JV-mers" . In Eq. [T] higher-order interactions are defined 
recursively through the lower-order terms. For instance, 
the two-body (2B) interaction is expressed as 

2 

where £'(1,2) is the dimer energy. Similarly, the three- 
body (3B) interaction is 

3 3 

^ 3S =£(l,2,3)-^£(z,j)+^£( l ) 

i<j i=l 

with £(1,2,3) being the trimer energy. Common force 
fields are pairwise additive, meaning that three-body and 
higher interactions are neglected. 

If it converges quickly, the many-body expansion repre- 
sents a powerful approach to studying condensed phases 
as it allows for the energy of an .ZV-molecule system 
to be expressed as a sum of lower-order interactions 
that can in principle be calculated with high accuracy. 
Recently, a detailed study of the convergence of the 
many-body expansion for water based on the analysis of 
small clusters was performed using coupled cluster theory 
with single, double, and perturbative triple excitations 
[CCSD(T)] and large basis setsP Consistent with pre- 
vious observations^"^, it was determined that, although 
two-body interactions dominate the expansion, the three- 
body term can contribute up to 30% of the total energy 
of the water hexamer. An estimate of the relative mag- 
nitudes of the many-body terms in liquid was obtained 
through an RIMP2 analysis of the 21-mer, for which two- 
body interactions were found to contribute 75-80% of the 
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total interaction energy and three-body interactions com- 
prised 15-20%P^For both the water hexamer and the 21- 
mer, higher-order terms contribute less than 5% of the 
total interaction energy. It should be noted that, while 
quickly converging for water, the many-body expansion 
has been shown to converge slowly and with marked os- 
cillatory behavior for other systems!^ 

In this study, the accuracy of several force fields, den- 
sity functional theory (DFT), and ab initio potentials in 
reproducing the two- and three-body water interactions 
is assessed through a detailed comparison with data ob- 
tained at the CCSD(T) level of theory ( Sect ion [n). Based 
on this analysis, a new ab initio water model, HBB2-pol, 
is then introduced in Section [TTTj In Section |TVj we show 
that HBB2-pol accurately maps the CCSD(T) results for 
both the 2B and 3B interactions into an efficient analyt- 
ical functional form and predicts the second and third 
virial coefficients in excellent agreement with the avail- 
able experimental data. A summary is given in Section 

m 



II. ANALYSIS OF TWO- AND THREE-BODY WATER 
INTERACTIONS 

A. Water models 

The many-body expansion provides the underlying ba- 
sis for common classical force fields. In most case s, in- 
cluding the widely-used TIP4P and SPC familieiPMI 
pairwise additivity is assumed, with three- and higher- 
body interactions being "encoded" into the effective two- 
body contributions. In addition, the majority of these 
models treat the water molecules as rigid monomers 
(i.e., the IB interactions are set to zero), with only 
few quant um w ater models, notably as q-TIP4P/f and 
qSPC/Fwj2212U allowing for molecular flexibility. Never- 
theless, pairwise force fields have been surprisingly suc- 
cessful at reproducing, at least qualitatively, the proper- 
ties of water in homogeneous environments.^ However, 
such force fields are expected to be inherently limited in 
their ability to model the microscopic behavior of aque- 
ous interfaces, water confined at the nanoscale, and clus- 
ters, whose properties are sensitive to the detailed inter- 
play of IB, 2B, 3B, and higher-body interactions PEEU 

Recent work has focused on improving empirical mod- 
els through inclusion of three-body i nterac tions, leading 
to the development of the E3B modelP^H Although the 
inclusion of explicit 3B interactions greatly improves the 
accuracy of the E3B model relative to pairwise force 
fields, the use of rigid water monomers and empirical 
parameterization necessarily misses some of the funda- 
mental properties of the many-body expansion. For ex- 
ample, recent E3B simulations of the isomeric equilibria 
of the water hexamer have led to predictions that are 
markedly different from ab initio calculations. Specifi- 
cally, the prism structure, which corresponds to the en- 
ergetically lowest-lying isomer at the MP2 and CCSD(T) 



levels of theoryj^El is unstable in the E3B calculations.^ 

Since non-pairwise additive intcrmolccular interactions 
arise primarily from electronic polarization at long dis- 
tances, several methods have been proposed to incor- 
porate this effect into the framework of classical force 
fieldsPS One common approach is the Applequist polariz- 
able point dipole modelj^ which was elaborated upon by 
Thole to address the so-called polarization catastrophe.^ 
Thole-type polarizable force fields for water include 
TTM3-FS1 TTM4-F2SI, and AMOEBA^ models. 

Among methods that attempt to solve directly the 
many-body problem from "first principles" , semiempiri- 
cal models represent an attractive alternative due to their 
computational efficiency. Semiempirical models such as 
PM^and PM3-MAIS^were derived within the MNDO 
scheme and differ primarily in the form of the core-core 
repulsion as well as in the precise values of their ad- 
justable parameters. These models were parametrized 
either by fitting experimental data for a wide variety 
of systems (PM3) or ab initio reference data in the 
case of PM3-MAIS. Due to the use of a minimal ba- 
sis and the explicit neglect of correlation, semiempiri- 
cal methods are particularly limited in their ability to 
describe non-bonded interactions. This deficiency has 
been addressed by the SCP-NDDO model, which aug- 
ments traditional semiempirical methods with classical 
polarization. 34 SCP-NDDO has shown success in mod- 
eling water clusters and has recently been extended to 
simulations of bulk properties.^ 

Different DFT methods has also been extensively used 
to the study of condensed phases, pri marily through the 
use of GGA functionals such as BLYP^122I and PBEP 
However, common density functionals are by construc- 
tion limited in their ability to describe weakly interact- 
ing van der Waals complexes. One attempt to address 
this problem involves the addition of a dispersion cor- 
rection to the energy through the Cq/R 6 term, where 
the Cg parameters are atom and basis-set specific^ 4 ^. 
These "DFT-D" models have successfully described sys- 
tems such as the solvation of iodide in watei^, but are 
limited by the need to develop parameters for each func- 
tional/basis set p2l Furthermore, because the correction is 
pairwise additive, it neglects higher-body dispersion con- 
tributions. Recent work to address this limitation has 
been reported.^ 

A promising alternative to the pairwise DFT-D correc- 
tion is represented by the non-local van der Waals (nl- 
vdW) functionalsP^"^ These nl-vdW functionals utilize 
the electron density to define a non-local correlation con- 
tribution to the exchange-correlation functional, leading 
to a consistent description of both short-range and long- 
range interactions. Since no atomic or basis-set depen- 
dent parameters are required to describe the dispersion 
interaction due to the explicit dependence of the non- 
local correlation on the electron density, nl-vdW func- 
tionals, in principle, require minimal parameterization 
and are system- independent. In practice, great care must 
be taken to avoid double counting of correlation effects in 
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the combination of semi-local and non-local terms. Van 
der Waals density functionals have recently been applied 
to the study of liquid watei^ 3 and icd^P. 

One final class of models is represented by the ab ini- 
tio-based interaction potentials. These models are built 
upon a rigorous treatment of the many-body expansion 
of interactions and are characterized by having a func- 
tional form that is sufficiently flexible to accurately map 
high-quality ab initio refere nce da ta. Examples of such 
models are DPP2j±2l GC-polJ^I an d WHBBP DPP2 
and CC-pol are restricted to the rigid, vibrationally av- 
eraged monomer geometries, while WHBB uses permu- 
tationally invariant polynomials to represent the flexible 
monomer 2B and 3B potential energy surfaces (PESs). 
Such ab initio-b&sed interaction potentials are quite com- 
putationally demanding and are most commonly used in 
calculations for gas phase systems^, although bulk prop- 
erties have b een obtained from classical simulations with 
CC-poPMland DPP2 12 -. Very recently, a flexible version 
of CC-pol has been developed, CC-pol-8s/, and the effects 
of flexibility on the dimer vibrational-rotation-tunneling 
(VRT) spectra have been characterized.^ It was found 
that both CC-pol-8s/ and HBB2 (the 2B potential of 
WHBB) reproduce the experimental VRT spectra "about 
equally well" P 



B. Comparison to CCSD(T) 

Here, we assess the ability of the models presented 
in Section III Al to describe the 2B and 3B water inter- 
actions. Roughly 1400 2B interactions and 500 3B in- 
teractions were evaluated at the CCSD(T)/aug-cc-pVTZ 
leve l 54 * 55 -* and corrected for the basis set superposition 
error (BSSE) using the counterpoise method. 56 These 
(flexible) molecular configurations were extracted from 
1) classical molecular dynamics (MD) simulations of hex- 
amers at T < 30 K using the WHBB potential, 2) clas- 
sical MD simulations of ice Ih carried out with TTM3-F 
at 50 K, and 3) classical MD simulations of bulk wa- 
ter at 298 K and experimental density using TTM3-F. 
Hereafter, these configurations are referred to as "low- 
energy" configurations. For the analysis of E3B, the 
CCSD(T) reference interaction energies were recomputed 
for "rigidified" molecules corresponding to the flexible 
configurations that were used in the comparison of the 
other models. All DFT energies were computed using the 
aug-def2-TZVPP basisP^ 3 with the exception of BLYP- 
D, for which the TZVPP basis w as used as in the orig- 
inal parametrization of the model ™* 40 * 58 * MP2 energies 
were computed with the aug-cc-pVTZ basis, and both 
DFT and MP2 interactions were corrected for BSSE. All 
ab initio calculations were performed using the freely- 
available ab initio package ORCA^. PM3 and PM3- 
MAIS energies were calculated using the AMBER/ SQM 
semi-empirical package®', while the SCP-NDDO energies 
were obtained using CP2KP^ A linear regression anal- 
ysis for the data presented in Figures [l] and [2] as well as 



root mean square error with respect to CCSD(T) data, 
are presented in the supporting material. 

Figures [T] and [2] show correlation plots for the 2B and 
3B interactions calculated for all models described in 
Section |HA] relative to the CCSD(T)/aug-cc-pVTZ en- 
ergies. While most empirical pairwise force fields implic- 
itly include nuclear quantum effects, models such as q- 
TIP4P/f and q-SPC/Fw were specifically parameterized 
for quantum simulations and, therefore, presumably pro- 
vide an ap proximation to the actual Born-Oppcnhcimcr 
PES.I2SIII] As can be seen from Figure |TJ q-TIP4P/f devi- 
ates substantially from the CCSD(T) 2B potential en- 
ergy surface to compensate for the neglect of higher- 
order interactions (Figure [2| . Force fields that account 
for higher-order terms generally provide a more accurate 
description of the 2B interactions than effective pairwise 
models. In this context, while E3B and TTM3-F /TTM4- 
F/ AMOEBA treat higher-order interactions using differ- 
ent schemes, all four models give 2B interactions that are 
in closer agreement with the CCSD(T) results than the 
effective pairwise models. It is interesting to note that 
E3B, which does not not explicitly include induction and 
was not parameterized using ab initio data, describes the 
3B contributions energies reasonably well. 

The three polarizable models considered in this study 
(TTM3-F, TTM4-F, and AMOEBA) differ in the way 
they describe the variation of the molecular charge dis- 
tribution. As an isolated monomer deforms, the molec- 
ular dipole moment varies in a "nonlinear" fashion 
with respect to the intramolecular coordinates, result- 
ing in a "nonlinear dipole moment surface" (DMS)P^ 
In TTM4-F, the first-order changes of the DMS are 
fit to electric multipoles and polarizabilites calculated 
at the MP2 level. The intramolecular dependence of 
the atomic charges in TTM3-F was instead motivated 
by the observation that, while the gas phase monomer 
charges decrease during the homolytic dissociation, a 
water molecule in the condensed phase dissociates into 
charged ions. This argument was used to justify an 
empirical correction to ab initio-devrved values, giving 
rise to effective charges that increase as the monomer 
geometry departs from equilibrium. By contrast, al- 
though AMOEBA takes into account intramolecular flex- 
ibility, the monomer charges are geometry independent.^ 3 
Interestingly, while the accurate monomer DMS has 
been reported to be essential to reproducing the sol- 
vated monomer geometry} 6 ^* the three-body interaction 
of AMOEBA is only slightly less accurate than TTM4-F, 
with RMS errors of 0.22 and 0.09 kcal/mol respectively. 
It is also important to mention that, unlike in TTM3-F, 
in both AMOEBA and TTM4-F the molecular polarz- 
ability is anisotropic. It is unclear whether the inaccu- 
racy observed in the TTM3-F 3B energies arises from its 
use of effective charges, isotropic molecular polarizability 
or both. 

Among the semiempirical methods, PM3 was fitted to 
a wide range of experimental and ab initio data, while 
PM3-MAIS and SCP-NDDO were both fitted to ab ini- 
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FIG. 1. Correlation plots for the 2B interactions. Plotted on the x-axes is the BSSE-corrected CCSD(T)/aug-cc-pVTZ energies. 
On the y-axes are the energies for each model. Empirically parametrized models are in orange, polarizable models in green, 
semiempirical methods in red, DFT methods in blue, and MP2 in maroon. For DFT and MP2, the colored dots are BSSE- 
corrected energies, while gray dots are BSSE-uncorrected energies. The new ab initio-based model, HBB2-pol, is in violet. 



tio reference data of water clusters. It is therefore not 
surprising that the 2B interactions of PM3-MAIS and 
SCP-NDDO are in better agreement with the CCSD(T) 
data than PM3. It is interesting, however, that SCP- 
NDDO shows much tighter correlation to the ab ini- 
tio data than PM3-MAIS, even though the latter uses 
almost twice as many adjustable parameters as SCP- 
NDDO. While the two MNDO-typc semiempirical meth- 
ods display significant deficiencies in describing the 3B 
interactions, SCP-NDDO reproduces the CCSD(T) data 
quite accurately. These results suggest that the addition 
of classical polarization, as implemented in SCP-NDDO, 
can allow semiempirical methods to accurately describe 
intermolecular interactions without requiring extensive 
reparametrizations of the core-core terms. 

At the 2B level, the GGA density functionals differ 
appreciably from the CCSD(T) results (see Supporting 
Information for PBE and PBEO results), with BLYP 
systematically underestimating the interaction strength. 
The inclusion of the dispersion correction in BLYP-D im- 
proves the agreement with the CCSD(T) values for the 
2B interactions. However, although DFT is less sensitive 



to basis set incompleteness than wavefunction methods, 
the absence of diffuse functions in the BLYP-D basis re- 
sults in a large BSSE correction (see figure [I] where blue 
circles give the BSSE-corrected interaction and gray cir- 
cles the BSSE-uncorrected interaction). Indeed, BSSE is 
so small for BLYP, B3LYP, RPW86PBE, and VV10 that 
it is barely visible in Figures [T] and [2] While BLYP-D 
can accurately describe the 2B interactions when a suf- 
ficiently large basis set is used or the energy values are 
corrected for BSSE, how to balance these factors in con- 
densed phase simulations is n ot str aightforward and is 
the subject of ongoing research! 42 ! 64 ! 

Whil e th e use of hybrid functionals, such as 
B3LYP^M1, results in a much tighter correlation to the 
CCSD(T) data than GGA functionals, B3LYP nonethe- 
less inherently suffers from inadequate treatment of dis- 
persion interactions, which leads to an incorrect long- 
range behavior.^ Among recent nl-vdW functionals, 
VVIO appears to over-correct its parent functional, 
RPW86PBE, leading to over bound 2B interactions. All 
DFT methods perform reasonably well for the 3B inter- 
actions. It is important to note that, because the dis- 
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FIG. 2. Correlation plots for the 3B interactions. Plotted on the x-axes is the CCSD(T)/aug-cc-pVTZ energies corrected for 
BSSE. On the y-axes are the 3B energies for each model. The color scheme is the same as in Figure [I] 



persion correction is pair additive, BLYP and BLYP-D 
provide identical 3B interactions. By contrast, nl-vdW 
functionals include a three-body dispersion correction, 
although this is almost negligible for VV10 (see Support- 
ing Information). MP2 agrees well with CCSD(T), with 
an RMS of 0.03 and 0.02 kcal/mol for the 2B and 3B 
interactions, respectively. Consistent with previous ob- 
servations, the magnitude of BSSE is much smaller for 
3B than 2B interactions. 1 ^ 

With the exception of MP2, WHBB provides the low- 
est RMS for the 2B interactions. WHBB employs a 
permutationally invariant polynomial with 5227 coeffi- 
cients that were fit to reproduce ~30000 CCSD(T)/aug- 
cc-pVTZ 2B interactions. To account for basis-set 
truncation, the reference 2B interactions were chosen 
as weighted averages of BSSE-corrected and BSSE- 
uncorrected CCSD(T) interactions.^. Since both 
WHBB and CC-pol reproduce the VRT spectrum of the 
water dimer with comparable accuracy,^ a similar agree- 
ment with the CCSD(T) data at the 2B level is also ex- 
pected for CC-pol. The agreement of WHBB with the 
CCSD(T) values for the 3B interactions is less satisfac- 
tory, with WHBB increasingly underestimating the ener- 
gies of the lowest-lying trimers. Results for the HBB2-pol 



model will be discussed in the following sections. 



III. METHODS 



Due to its rapid convergence for water, the many-body 
expansion of interaction energies provides a viable way to 
"scale up" the CCSD(T) level of accuracy to a large num- 
ber of molecules. Furthermore, by accurately fitting the 
IB, 2B, and 3B interactions into a relatively inexpensive 
function, simulations of condensed phases at an effective 
CCSD(T) level of accuracy become feasible. For flexi- 
ble monomers, the most sophisticated effort along these 
lines, WHBB,53 has indisputably proven this concept. 
However, WHBB is not directly applicable to bulk phase 
simulations due to its prohibitively expensive 3B term. 
Motivated by this observation, this section reports the 
development of a new model, HBB2-pol model, begin- 
ning with a discussion of the 3B interaction. 
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A. Three-body Interaction 

Our development exploits the fact that the 3B inter- 
action in water arises primarily from induction, with all 
other contributions vani shing quickly as the intermolec- 
ular separation increases! 68 ^ ' This naturally leads to the 
following ansatz: 



3B 

HBB2-pol 



3B 



(2) 



that represents the 3B interaction as the sum of an in- 
duction term, V^f, and a short-range "correction" , V^ y . 
The physical origins of V^ y are related to the breakdown 
of the assumptions made in the derivation the Thole-type 
induction term as well as to the quantum-mechanical 
contributions a ssociate d with 3B exchange-repulsion and 
charge transfer PLMZU The induction scheme of TTM4-F 
is used in Vf®d ^ ue t° ^ s superior accuracy with respect to 
other polarizable models (see Fig. [2]). The short-ranged 
nature of the "correction" is enforced explicitly by the 
switching function, S3, 

S3 = /(fcO/tfia) + /(£l2)/(& 3 ) + /(£l 3 )/(6 3 ), (3) 

where 



This form was found to be better capable of including 
all trimers in the first solvation shell of a central water 
molecule (in particular, the "linear" trimers) than those 
based on maximum oxygen-oxygen separations. Impor- 
tantly, while the switch in Equation ^ goes from 3 to 
as the trimer passes from the short-range to the long 
range, the product s^V^ y is fitted, rather than V^ y by 
itself. This ensures that no artifact is introduced due to 
the switching. However, because szV^ y is fitted in the 
context of V^, this also implies that, unlike in the case 
of the WHBB 3B polynomial, V^ ly of HBB2-pol has no 
meaning by itself but only as the sum szV^ y + Vf® d - 

The ability of the short-range polynomial, V^ y , to 
accurately fit reference data depends largely on its de- 
gree, which also determines the associated numerical 
cost. Consequently, the large 5th and 6th degree 3B 
polynomials in V^ y of WHBB constitute the most com- 
putationally taxing part of the model. The different rep- 
resentation of the 3B interactions (see Eq. ([2])), along 
with the improved description of the induction energies 
in HBB2-pol allows for an accurate fit of the CCSD(T) 
reference data with a lower-degree polynomial. Specifi- 
cally, in HBB2-pol the V^ y part is a sum of second and 
third degree symmetrized products of exponentials of the 
intermolecular separations 



f(0 = 



< £ < 1 



(4) 



and 



= [Rij - Ri)/{Rf - Ri), Rij = |rf - i 
denotes the position of the n-th molecule oxygen atom. 



Tjij = cxp ( - k\ri - Tj\). 



(•5) 



where k is an adjustable parameter. Neither intramolec- 
ular distances nor the two-body terms - those which do 
not depend on the positions of all three molecules simul- 
taneously - were included into the Vg£ y in HBB2-pol. 
Labeling the three molecules as a, b, and c, there are 27 
distances that contribute: 



Vi = 


:e -fc HH rf(Hal,Hbl) ) m = 


e -fc HH d(Hal,Hb2) ) 


m = 


= e -fc HH rf(Hal,Hcl) ) 


m = 


= e -fc H Hd(Hal,Hc2) ; ^ = 


e -fc H Hd(Ha2,Hbl) ) 


m = 


= e -fc HH d(Ha2,Hb2) j 


m - 


= e -A :H Hrf(Ha2,Hcl) ! % = 


e -fc HH rf(Ha2,Hc2) 5 


V9 = 


= e -fc H H<i(Hbl ) Hcl) ) 


Vw = 


e -fc HH d(Hbl,Hc2) 5 m = 


c -fc HH d(Hb2,Hcl) ; 


V12 


= c -WKHb2,Hc2) ; 


Vl3 


= e -fc Hrf(Oa,Hbl) ! vu = 


= e -fc OH rf(Oa,Hb2) 5 


Vl5 


= e -fc Hd(Oa,Hcl) 


vie 


= c -fco H d(Oa,Hc2) ; Vi7 = 


= e - fc o H rf(Ob,Hal) ; 


Vis 


_ A -A:oHrf(Ob,Ha2) 

> 




= c -fc O Hd(0b,Hcl) ; mo __ 


= e -fc OH d(Ob,Hc2) ; 


mi 


= e -fcoH^(Oc,Hal) ) 


mi 


= e -fco H rf(Oc,Ha2) j m3 __ 


= e -fc OH d(Oc,Hbl) 5 


mi 


= e -A:oHd(Oc,Hb2) i 


?725 = e-feod(Oa,Ob) ) m& 


= G -fcoorf(Oa, Oc)^ 


mi 


= e -WKOb,Oc) ; 



where g?(X,Y) stands for the distance between atoms X 
and Y (see also Eq.([5])). The monomials were constructed 
by symmetrizing the products of the rj n variables with 
respect to the permutations of both the molecules and 



the hydrogen atoms within each molecule (48 elements 
in the permutation group total). A total of 131 different 
monomials were identified (13 are of second degree, and 
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the remaining 118 are of third degree): 

«1 = V5V7 + V7V9 + V10V4 + V5V9 + V6V7 + VllV2 

+ mm + mm + mm + mim + vwm + mm 
+ mim + mm + mm + mim + m^m + mm 
+ mm + V12V8 + mim + m2m + mam + vwvs, 

«48 = m^m^m + momzm& + msmzm + viomemz 
+ m^mzm + vumsm + mamzm + m^mzm 

+ miVume, + mvmmz + ?7i2?7i4?7i8 + vwrnm 

+ mmiwrni + m%m\m + vwm^m + vnVuVn 

+ Vi3msm + '7i2%6%2 + »7ii?7i5%2 + V^VuVn 

+ m$mm\ + vuvism + mom^m + vuvnm 
+ m^mim + mzmim + mom^m + vwiism 
+ m^m^m + m^mzm + mim§m\ + m^m^m 
+ msvnm + mom^m + m§m-im + mm^mz 
+ mw^m + vuvnV3 + V20V24V6 + m^mim 
+ momzvn + mzm&m + mt>mim + m\m$m\ 
+ mem2V6 + memmi + vwm^m + mam^m 

«i3i = m^m&mi- 

The VpJl y itself was then taken as a linear combination 
of K n : 

131 

V poiy = J2 Vnfin > ( 6 ) 

71=1 

with the coefficients v n obtained using the least squares 
fitPto the CCSD(T) data. The gradient of V^ y with 
respect to the atomic positions was computed using 
MAPLEP 



B. Composition of three-body training set 

After translational and rotational invariance, the 3B 
interaction potential for flexible water molecules is 21 di- 
mensional. Since this high-dimensional PES cannot be 
readily trained on a grid, a training set representative 
of the "important regions" of the 3B PES was gener- 
ated by including: 1) repulsive configurations with pos- 
itive binding energies that are compressed relative to 
the trimer global minimum, 2) "low-energy" configura- 
tions with thermally accessible binding energies, and 3) 
long-range trimer configurations that have weak 3B in- 
teractions. The total training set consists of 8019 trimer 
configurations for which the 3B energies were computed 
at the CCSD(T)/aug-cc-pVTZ level and corrected for 
BSSEP 

The majority of the configurations in the training set 
correspond to an expanded set of "low-energy" configura- 
tions, similar in composition to that used in the analysis 
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FIG. 3. Distribution of O-O distances in the three-body train- 
ing set, including all three O-O distances per trimer. "Low- 
energy" denotes configurations which were thermally accessi- 
ble (see text for details). 



of section [TT] Of the 5515 thermally accessible configura- 
tions that were used in the fit, 996 trimers were selected 
from MD simulations of clusters (trimers and hexamcrs) 
at 30 K on the WHBB potential energy surface, 3311 
trimers were extracted from classical MD simulations of 
hexagonal ice and liquid water carried out with TTM3-F 
at 50 K and 300 K respectively, 792 trimers were ob- 
tained by randomly orienting water monomers in geome- 
tries near the global minimum, and 416 trimers were ob- 
tained from scans of low-energy structures. 

The long-range portion of the training set consists of 
432 weakly interacting trimers, with an average 0-0 dis- 
tance of at least 5.5 A between monomers. After verify- 
ing that the CCSD(T)/aug-cc-pVTZ 3B interaction en- 
ergy for these long-range configurations was in agreement 
with the 3B induction energy from TTM4-F (within the 
error associated with basis-set truncation), we assigned 
these configurations the TTM4-F 3B induction energy. 
This enforces the "boundary condition" that the 3B in- 
teraction of HBB2-pol become pure induction at long dis- 
tances. 

It is important to mention that our initial model were 
fitted to a training set that emphasized only the low- 
energy and long-range regions, which is consistent with 
the parameterization strategy adopted for the WHBB 
3B potential. While the resulting models succeeded in 
predicting the relative stabilities of trimer and hexamer 
isomers, we found that they were numerically unstable 
in MD simulations of larger clusters, such as the 32-mer. 
This was related to insufficient coverage of the repulsive, 
short-ranged region of the trimer PES. To address this in- 
stability, 2072 configurations were generated by perform- 
ing random rotations on monomers that were compressed 
relative to the trimer minimum geometry. As shown 
in Figure [3j this repulsive training set evenly covers O- 
O distances from 1.8 A to 3.2 A. Many of these com- 
pressed configurations, however, correspond to trimers 
that would practically never be sampled in simulations 
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FIG. 4. Distribution of trimer binding energies in the three- 
body training set. 



of water under ambient conditions. While including these 
configurations in the training set was required to ensure 
the numerical stability of the model, it was also neces- 
sary to guarantee that these highly-repulsive configura- 
tions did not degrade the quality of the fit in the region 
near the minimum. This consideration is discussed in the 
following section. 



C. Testing the accuracy of the three-body fit 

In order to assess the accuracy of any model with 
respect to ab initio data, it would be ideal to param- 
eterize the model based on one set of reference data 
(the "training" set) and validate the model using a sep- 
arate set of data (the "testing" set). However, due 
to the large computational cost associated with BSSE- 
corrected CCSD(T)/aug-cc-pVTZ 3B interactions and 
the fact that the model should be parametrized to as 
large of a reference set as possible, having two distinct 
sets is unfeasible. In order to estimate the bias associated 
with training and testing on the same reference data, the 
complete training set was equally divided into two parts, 
a training set and a testing set. Care was taken to pre- 
serve the relative compositions of each set, e.g., half of 
the 972 configurations extracted from simulations of ice 
were randomly selected and placed in the training set 
while the other half was placed in the testing set. 

Since the goal of HBB2-pol is to describe water from 
small molecular clusters in the gas phase to condensed 
phases, we found it important to decompose the RMS 
into categories: 1) trimers with a binding energy of 
less than 5 kcal/mol, which are particularly important 
for low-temperature cluster isomer relative stabilities, 2) 
trimers with a binding energy of less than 30 kcal/mol, 
which are sampled during path-integral molecular dy- 
namics (PIMD) simulations of bulk water at ambient 
conditions, and 3) the complete training set, regardless of 
binding energy (see Table [I]). Due to the proximity of the 



TABLE I. RMS deviation of three-body interactions from 
trimers with binding energies less than 5kcal/mol, less than 
30kcal/mol, and for the complete training set. See main text 
for details. 





WHBB Vl B d 


HBB2-pol 




RMS for Trainin 


I Set 


Ei,i nt i < 5 
Ebind < 30 

Total 


0.10 0.13 
0.14 0.35 
0.69 2.26 


0.06 
0.10 
0.85 




RMS for Testing 


Set 


Efjind < 5 
Efcind < 30 

Total 


0.10 0.13 
0.15 0.35 
0.68 2.20 


0.06 
0.16 
0.82 


RMS for Complete Set 


Efjind < 5 
'Eibind < 30 

Total 


0.10 0.13 
0.14 0.34 
0.66 2.17 


0.05 
0.11 
0.83 



energies for different isomers of the small clusters, it is 
desirable that 3B interactions corresponding to clusters 
with the lowest binding energies have the lowest RMS. As 



was discussed in Section III B it was necessary to include 
3B energies corresponding to trimers with positive bind- 
ing energies to ensure the stability of HBB2-pol. At the 
same time, care was taken to ensure that the accuracy in 
the low-energy region was not diluted by being needlessly 
accurate for trimers with enormous binding energies that 
will rarely be sampled. By appropriately weighting the 
reference data, HBB2-pol achieves the best accuracy in 
the low-energy region, slightly larger RMS in the inter- 
mediate region (E bind < 5 kcal/mol), and the largest er- 
ror for those configurations with binding energies greater 
than 30 kcal/mol. To obtain this RMS distribution, con- 
figurations with 3B interactions were weighted accord- 
ing to their trimer binding energies, where configurations 
with Ebind < 15 kcal/mol were given a weight of 1.0, 
while weights for configurations binding energies larger 
than 15 kcal/mol a weight of e~ alyEbirld ~ Ea \ where a = 
0.05 kcal/mol -1 and Eq was 15 kcal/mol. 

The results presented in Table [I] confirms the ability of 
the HBB2-pol 3B function to recover the ab initio data 
and demonstrates that the training set is sufficiently large 
to render the model insensitive to the size of the training 
set. This analysis does not, however, probe whether the 
training set includes all the physically relevant configura- 
tions. Assessing whether the composition of the training 
set is biased can only be accomplished by examining ab 
initio properties such as relative cluster isomer stabili- 
ties, experimental properties such as virial coefficients, 
and the overall numerical stability of the model. 
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D. HBB2-pol 



IV. RESULTS 



Using the 3B interaction proposed in Eq. (|2|), HBB2- 
pol has been developed through the many-body expan- 
sion, Eq. ([lj. The spectroscopically-accurate monomer 
potential energy surface of Partridge and Schwenke is 
used for the IB terms.^ For the 2B interaction, the 
HBB2 PES^ is employed at short-range. The HBB2 
short-range 2B interaction is smoothly switched to elec- 
trostatics/induction plus dispersion term at long-range 
over the interval Rj B = 5.5A < R OQ < R 2 F B = 7.5A: 



V 2B = (1 - s 2 )V* B BB2 



S2 



V. 



2B 
elec 



V 



2B 



C 6 



S2 




15f + 6£ 2 



o < e < i 



The V 2B C and V™ 



where £ = (R QO - Ri)/(R F - Ri) 
terms have the same form as TTM4-F, and the value of 
Cq was taken as the difference between the sum of the ab 
initio van der Waals constants describing the dispersion 
and induction interactions in the asymptotic region from 
Ref. 2H1 and the 1 / Rqq coefficient of the isotropic part 
of V% 

C 6 = (47.053232 a.u. + 10.66517 a.u) - 2a fJ 2 , 

(using the isotropic molecular polarizability given by 
TTM4-F, a = 1.41567 A 3 the molecular dipole fjt = 
1.864047 D). The 3B interactions are those presented in 
Eq.([2]), and all the higher-body terms are approximated 
by the induction energy as in TTM4-F, 



yNB = yNB 

ma 



N 

E 

i<j<k 



N 

E 



(7) 



Since V^ B2 accounts for polarization at the 2B level, the 
short-range 2B contribution must be subtracted from the 
N-body induction to prevent double counting. This prob- 
lem does not arise for the 3B interactions since induction 
is not modified at the 3B level (Eq. Q). The HBB2-pol 
interaction energy for N water molecules is thus given by 
the following expression 

N 

EtJ- mer = ^ Vpg (i) 
i 

+ E{ (1 - -a) KIb2 VSR + S2 [Klfc " ^] } 
N 

+ E s 3 V p 3 %(i,j,k) + V^(l,...,N), (8) 

i<j<k 

where the 3B switching functions, s 3 , is given byEq.(§. 



In this section we demonstrate the ability HBB2-pol to 
reproduce CCSD(T) calculations and the experimental 
second and third virial coefficients. 



A. Short-range Three-body Interaction Addresses 
Systematic Flaws in Polarizable Models 

As discussed above, the three-body interactions pri- 
marily originate from induction, though for more 
strongly bound clusters effects including exhange- 
repulsion and charge transfer can also make a signif- 
icant contribution! 68 ! 70 ^ As a consequence, force fields 
which treat only induction are inherently unable to fully 
describe 3B interactions. By contrast, models that 
only treat short-range 3B interactions are unable to de- 
scribe the induction interactions that dominate at long 
range. This is illustrated in Figure [5j where HBB2-pol 
is compared with WHBB, TTM4-F 3B induction, and 
CCSD(T)/aug-cc-pVTZ data along two representative 
cuts through the water trimer PES. The CCSD(T) refer- 
ence data used for this comparison were not included in 
the training set. This comparison clearly shows that the 
addition of the short-range "correction" to the induction 
brings the 3B interactions of HBB2-pol into close agree- 
ment with the CCSD(T) data. 



B. Trimer stationary points 

To assess the combined accuracy of the IB, 2B, and 
3B interactions of HBB2-pol, we studied the relative en- 
ergies of four water trimer isomers identified in Table [n] 
by their free-hydrogen orientation: "u" for pointing up, 
"p" if the hydrogen lies in the plane of the oxygen atoms, 
and "d" for pointing down. The energetics of these struc- 
tures have been reported in Ref. 1751 where geometries 
optimized at the MP2/aug-cc-pVQZ level were used to 
calculate the energies at the CCSD(T) level in the com- 
plete basis limit. The HBB2-pol energies relative to the 
trimer global minimum (uud) are reported in Table |Tlj for 
the geometries optimized on the HBB2-pol PES. These 
favorably interacting trimers have CCSD(T) binding en- 
ergies of approximately -15 kcal/mol, which implies that 
the energies separating these stationary points are on the 
order of 1-10% of the binding energy. The HBB2-pol rel- 
ative energies fall within 0.05 kcal/mol of the reference 
data for the upd and uuu structures, while a larger differ- 
ence of 0.23 kcal/mol is obtained for the ppp structure. 
HBB2-pol, however, is not expected to achieve perfect 
agreement with the CCSD(T)/CBS data 75 due to the 
different basis set used in the fit of the 3B terms. 
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C. Virial coefficients 



3 3.5 4 4.5 5 

Average 0-0 distance (Angstrom) 



FIG. 5. Three-body interaction energy for two cuts through 
the water trimer potential energy surface: WHBB (black 
dashes), three-body induction (blue), HBB2-pol (red) and 
CCSD(T)/aug-cc-pVTZ (crosses). 



TABLE II. Relative energies of water trimer isomers with 
respect to the global minimum "uud" in kcal/mol. CCSD(T) 
energies were extrapolated to complete basis set limit, from 
Ref . [75] 







CCSD(T) 


HBB2-pol 




uud 


0.0 


0.0 


V 


upd 


0.23 


0.18 


Si 


uuu 


0.77 


0.73 




ppp 


1.25 


1.02 



Virial coefficients are derived from the virial equation 
of state that expresses p/ksT as a power series in density 
and gauge deviations from the ideal gas behavior, 



P 



N 
V 



1+Bn 



N 
V 



Bo 



(9) 



Here, B 2 and Bj, are the second and third virial coef- 
ficients, respectively.^^ The second virial coefficient 
depends only on the pair interaction while the third 
virial coefficient also includes the 3B interaction, but no 
(n > 3)-body energies. Since both B 2 and B 3 are ex- 
perimentally accessible, the virial coefficients provide a 
critical assessment of the accuracy of water potentials. 

Neglecting the contribution of intramolecular vibra- 
tional modes (that is, assuming rigid monomers) and 
nuclear quantum effects, the second virial coefficient is 
given by^Sl, 

B 2 (T) = -2k J di? 12 i?? 2 </ 12 )^ 2 , 

/l2 = e -/Jv a *(H„,n 1 ,n a ) _ 1} (10) 

where /3 = l/fc^T is the inverse temperature, R\ 2 
is the distance between the monomer centers of mass, 
V 2B (Ri2, Qi, ^2) is the intermolecular interaction en- 
ergy, and the angular brackets stand for the average over 
the orientations of the molecules Sli i2 . The Mayer func- 
tion, /12, has the useful property of going to zero as the 
molecules move apart. To numerically evaluate Eq. (10 1, 



the Simpson rule is used to calculate the radial compo- 
nent of the integral, while Monte Carlo integration is used 
to evaluate the orientational average using 10 5 random 
orientations of the monomers at each point on the radial 
grid. 

To explore the sensitivity of the second virial coef- 
ficient to the choice of the rigid monomer geometry, 
its values are calculated using two different configura- 
tions: the Born-Oppenheimer minimum energy config- 
uration given by the Partridge-Schwenke potential en- 
ergy surface (r e J H = 0.95784A and 6% OH = 104.508°^, 
and the ground-state vibrationally-averaged configura- 
tion of Tq H — 0.9716256A and reported in reference [501 
(®hoh = 104.69°). Importantly, examining the results 
for these two configurations provides not only an estimate 
of the effect of flexibility, but also of nuclear quantum 
effects "sensed" through the ground-state vibrationally 
averaged configuration. 

Plotted in Figure [6] is the difference between the cal- 
culated virial coefficients and the experimental data.^1 
Since B 2 is the integral of e~~^ v — 1, comparison to low- 
temperature results are particularly interesting as these 
are most sensitive to the region near the dimer mini- 
mum geometry. For the Born-Oppenheimer equilibrium 
geometries (Fig. [6^,), the ab initio-based potential en- 
ergy surfaces WHBB and HBB2-pol very closely repro- 
duce the experimental data. Since HBB2-pol and WHBB 
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FIG. 6. Effect of the rigid monomer configuration on the error in the classical second virial coefficient relative to experiment.^ 
Differences between WHBB and HBB2-pol were indistinguishable on the scale of this plot, so both have been assigned to the 
red line. For E3B, its rigid monomer geometry was used in both plots. 
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(a) Equilibrium monomer geometry 



(b) Vibrationally-averaged monomer geometry 



share the same two-body PES at dimer separations of less 
than 5.5A, it is not surprising that both models predict 
similar values for the second virial coefficient. Though 
detectable, the differences between WHBB and HBB2- 
pol are not visible on the scale of Figure [6j so both have 
been assigned to the red line. 



While both qTIP4P/f and E3B were empirically 
parametrized, the inclusion of explicit 3B interactions in 
E3B allows for a more accurate 2B interaction. Pairwise- 
additive models, such as qTIP4P/f, on the other hand, 
rely on the 2B interaction to (partially) recover 3B ef- 
fects and would therefore be expected to have a much 
larger error for the second virial coefficient. When the 
monomer geometry is changed from the equilibrium to 
the vibrationally-averaged geometry, there is a small de- 



crease in the value of B 2 (T) for most models. TTM3- 
F, however, exhibits a large change in its second virial 
coefficient. This is likely due to the empirical modifi- 
cation of the dipole moment surface, which only affects 
TTM3-F when the monomer distorts from the equilib- 
rium configuration.^ 

The third virial coefficient depends on the interaction 
of trimers and provides an indirect measure of the 3B 
interaction. Following HilPsl, the third virial coefficient 
can be separated into a pairwise component, B 3 (T), and 
the 3B contribution, AB% B (T): 



B 3 (T) = Bl(T) + ABl B (T), 



(11) 



where the pairwise contribution is the integral over the 
product of the three Mayer functions in Eq. (12), and 
the 3B contribution is given by Eq. (13 1. 



B° 3 (T) = -\* 2 J dR n dR 13 Rl 2 R 2 13 (f 12 f 13 f 23 sin^ h3) ) nun 



(12) 



ABl B (T) = - 8 -n*J dR 12 dR 13 R\ 2 Rl,{[e-^ l]e" 



' sin$ 



(2,1,3) 



(13) 



rii ,ri2 ,^3 ,$(2 



Following the work of Tainter et al.P we computed 
the third virial coefficient by fixing one molecule at the 
origin, evaluating two radial integrals through the two- 
dimensional Simpson rule, and using Monte Carlo inte- 
gration for the 9 orientational degrees of freedom and the 
angle $(2,1,3) between the centers of mass of molecules 2, 
1, 3 at each radial grid point (using 10 6 monomer ori- 
entations). This integration strategy was demonstrated 



r 



in Ref . [2H to recover the results from the more efficient 
Mayer sampling approach!^ Our implementation repro- 
duces the data from Ref. [2"4l for the E3B model. 

Rigid, classical third virial coefficients using 
vibrationally-averaged monomer geometries are re- 
ported in figure [7] At lower temperatures, HBB2-pol 
agrees with WHBB. In the high temperature limit, 
however, HBB2-pol compare more favorably with 
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FIG. 7. Rigid, classical third virial coefficient using 
vibrationally-averaged monomer geometries. Experimental 
data from Ref. [52] 



experiment than WHBB, which is consistent with 
HBB2-pol providing a more accurate description of 
the 3B interactions. The errors in the 2B and 3B 
interactions of TTM3-F appear to cancel one another, 
resulting in an third virial coefficient that is remarkably 
close to experiment. Much work has been inve sted in 
exploring the role of nuclear quantum effect ji2| 83 | 84 | 
and monomer flexibility. 85 While the exploration of the 
monomer configuration on the second virial coefficient 
indicates that flexibility and nuclear quantum effects are 
important, these results are by no means conclusive. We 
will pursue a more rigorous characterization of the effect 
of flexibility and nuclear quantization in future work. 



V. SUMMARY 

In this study, the accuracy of several force fields, 
semiempirical methods, DFT, and ab initio-b&sed 
models in reproducing the two- and three-body wa- 
ter interactions was assessed against BSSE-corrected 
CCSD(T)/aug-cc-pVTZ data. Our analysis of the many- 
body expansion of the interaction energy indicates that 
defects inherent to polarizable models, which are non- 
negligable when molecules are close to one another, can 
be effectively corrected through an explicit short-range 
term expressed in terms of permutationally invariant 
polynomials. Based on these findings, we developed a 
new water model, HBB2-pol, that is derived entirely 
from "first principles" . HBB2-pol achieves excellent ac- 
curacy with respect to the CCSD(T) data for the two- 
and three-body interactions, isomer relative energies of 
small clusters, and second and third virial coefficients. 
Importantly, the inclusion of explicit polarization in the 
three-body interaction term enables the use of relatively 
low-degree polynomials, which, in turn, results in a sig- 
nificant decrease in the computational cost associated 
with HBB2-pol relative to other ab initio-hased models. 



Through its combined accuracy and computational effi- 
ciency, HBB2-pol thus opens the doorway to fully "first 
principles" simulations of water in the condensed phases, 
which will help resolve current controversies^HSH. 
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